An Integrated System for Library Construction, Affinity Binder Screening and Expression Thereof

ABSTRACT

A recombinant polynucleotide suitable for use in a display vector is provided. The recombinant polynucleotide includes from 5′ to 3′: a first nucleic acid sequence (or insert) encoding an amino acid sequence to be displayed on a surface; a first pre-selected restriction site; a second nucleic acid sequence encoding a surface peptide capable of being displayed on the surface; and a second pre-selected restriction site. Corresponding display vectors that can be converted into expression vectors in a high-throughput fashion, as well as methods of use thereof, are also provided.

TECHNICAL FIELD OF THE INVENTION

This application pertains to the construction and screening of displaylibraries, particularly the design and use of vectors therefor.

BACKGROUND OF THE INVENTION

High throughput screening for affinity binders, such as antibodies, thatbind specifically to a target, such as antigens, was made possible bysurface display technologies, including phage display, ribosomal/mRNAdisplay, yeast display and mammalian display. Various display vectorshave been developed such that upon expression, libraries of peptides orproteins can be displayed on the surface of phage, bacteria, yeast ormammalian cells. Affinity binders can then be identified through libraryscreening and selection processes such as phage panning orfluorescence-activated cell sorting (FACS). Thereafter, clonescorresponding to the affinity binders must be further characterized toascertain the identity of the binders, as well as for downstreamapplications.

Conventionally, inserts from binding clones are transferred or subclonedinto expression vectors, either individually or en masse, such that theinserts can be expressed, purified and/or further characterized.However, this subcloning process for transferring inserts from displayvectors to expression vectors is time consuming, laborious, and lowthroughput. In addition, the efficiency of subcloning is low andunpredictable, varying from about 50% to 80%. Thus, a need exists for afast, inexpensive and high-throughput method to accelerate the affinitybinder identification and characterization processes.

SUMMARY OF THE INVENTION

Aspects of the invention relate to the design and use of vectors forhigh-throughput conversion, e.g., from display to expression. In someembodiments, the design requires judicious engineering of restrictionenzyme sites at specific sequence locations, and employment ofcombinations of restriction enzymes and DNA ligases to facilitate theconversion. Various libraries can be constructed using vectors of thepresent invention to enable high-throughput library screening, binderselection/recovery, and binder characterization. Upon implementation ofsuch high-throughput processes, it has been surprisingly discovered thata high conversion rate (e.g., over 90%, over 95%, or close to 100%) canbe achieved, thereby greatly accelerating the binder discovery andcharacterization processes.

In one aspect, a recombinant polynucleotide suitable for use toconstruct various vectors is provided. The recombinant polynucleotidecomprises from 5′ to 3′: a first nucleic acid sequence encoding an aminoacid sequence to be displayed on a surface; a first restriction siteselected from the group consisting of XbaI, NcoI, SalI and XhoI sites; asecond nucleic acid sequence encoding a surface peptide capable of beingdisplayed on the surface; and a second restriction site selected fromthe group consisting of XbaI, NcoI, SalI and XhoI sites. In variousembodiments, the first nucleic acid sequence is engineered in-frame withthe second nucleic acid sequence. The first and second restrictionsites, when cleaved by corresponding restriction endonuclease thereto,produce compatible sticky ends.

In some embodiments, the first nucleic acid sequence encodes an antibodyfragment such as Fab. The second nucleic acid sequence may encode aphage coat protein, a yeast outer wall protein, a bacterial outermembrane protein, a cell surface tether domain, or an adapter, or atruncation or derivative thereof. For example, the second nucleic acidsequence can be gene III of filamentous phage M13, or a truncation orderivative thereof. The second nucleic acid sequence can also encode anadapter capable of binding to a binding partner, wherein said bindingpartner is expressed as a fusion and directly displayed on the surface.Correspondingly, the surface peptide can be for phage display, yeastdisplay, bacterial display or mammalian display, or shuttling displaybetween different hosts (e.g., via adapters). In various embodiments,when expressed, the surface peptide and the amino acid sequence encodedby the first nucleic acid sequence are displayed as a fusion protein onthe surface.

In certain embodiments, the first and second restriction site eachencode amino acids that do not interfere with binding affinity of theamino acid sequence, and/or display of the surface peptide or fusionprotein. In some embodiments, the first and/or second restriction siteis XbaI site.

In another aspect, a display vector for high-throughput conversion intoexpression vector is provided. The display vector comprises therecombinant polynucleotide described herein and a fusion tag sequence 5′to the first restriction site or 3′ to the second restriction site. Insome embodiments, when the fusion tag sequence is 3′ to the secondrestriction site, it is engineered such that upon (a) removal of thesecond nucleic acid sequence by cleaving the first and secondrestriction site, and (b) religation of the compatible sticky endsproduced therefrom, the first nucleic acid sequence is in-frame with thefusion tag sequence. The fusion tag sequence can be selected from one ormore of: an alkaline phosphatase tag, an AviTag, a cutinase tag, ahalotag, a flag tag, a c-myc tag, a histidine tag, a GST tag, a greenfluorescent protein tag, an HA tag, an E-tag, a Strep tag, a Strep tagII and a YoI 1/34 tag. Such display vector can be provided in a libraryof display vectors in which each display vector has a unique firstnucleic acid sequence, thereby forming a library for selection.

In a further aspect, a method of converting a display vector to anexpression vector is provided. The method includes: providing thedisplay vector described herein; cleaving the first and secondrestriction site with corresponding restriction endonuclease thereto,thereby producing said compatible sticky ends; and religating thecompatible sticky ends to produce an expression vector in which thefirst nucleic acid sequence is in-frame with the fusion tag sequence.

In some embodiments, the method can further include cleaving within thesecond nucleic acid sequence to increase religation efficiency in thereligating step. In certain embodiments, before the religating step, aproduct from the cleaving step can be diluted to increase intramolecularligation.

In various embodiments, after the religating step, the expression vectorproduced therefrom can be introduced into a host, to furthercharacterize the first nucleic acid sequence. Exemplary furthercharacterization includes sequencing the first nucleic acid sequenceand/or expressing the first nucleic acid sequence.

In various embodiments, the method is a high-efficiency method. Forexample, the providing, cleaving, religating and introducing steps canbe completed in less than 12 hours, in less than 8 hours, or in lessthan 4 hours.

The method can be performed in a high-throughput manner where aplurality of display vectors can be converted to a plurality ofexpression vectors in parallel. Thereafter, the plurality of expressionvectors can be introduced into a population of hosts for furthercharacterization. Such high-throughput conversion can have a conversionrate (from the plurality of display vectors to the plurality ofexpression vectors) that is higher than 90%, higher than 95%, or higherthan 98%.

In yet another aspect, a method of identifying an affinity binder to atarget is provided. The method includes: screening a population of firsthosts each containing the display vector described herein, to obtain asubpopulation of first hosts having binding affinity to a target,wherein each first host displays, on a surface of said first host, theamino acid sequence encoded by a unique first nucleic acid sequence inthe display vector, and wherein said subpopulation of first hosts eachdisplay an affinity binder to said target; converting display vectorsisolated from said subpopulation of first hosts to expression vectors bycleaving the first and second restriction site to remove the secondnucleic acid sequence and religating the compatible sticky ends producedtherefrom; and introducing said expression vectors into a population ofsecond hosts, to further characterize the affinity binder. In variousembodiments, the method can be performed in a high-throughput fashion,wherein the converting step can have a conversion rate that is higherthan 90%, higher than 95%, or higher than 98%.

Also provided are libraries and kits constructed using the vectorsdescribed herein and for carrying out the methods described herein.Further provided are sublibraries generated during the screening processand affinity binders obtained from methods described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 provides a schematic restriction map showing an exemplary vectorof the present invention.

FIG. 2 provides a gel electrophoresis image showing the conversion rateof an exemplary vector at different concentrations. One restrictionenzyme, XbaI was used.

FIG. 3 provides a gel electrophoresis image showing the conversion rateof an exemplary vector at different concentrations. Two restrictionenzymes, XbaI and XmnI were used.

DETAILED DESCRIPTION OF THE INVENTION

This invention relates to a high-throughput process which integratesaffinity binder discovery with downstream characterization thereof. Suchhigh-throughput process has been surprisingly shown to save tremendousamount of time and effort. As a result, the entire process from libraryscreening, binder selection, to binder characterization can bestreamlined, while dramatically accelerating the pace of affinity binderdiscovery and characterization. In practice, the high-throughput processof the present invention is commercially desirable in affinity reagentsdiscovery, and can be applied to various display platforms, includingphage display, bacterial display, yeast display, mammalian display, aswell as other display systems.

It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory onlyand are not restrictive of the compositions and methods describedherein. In this application, the use of the singular includes the pluralunless specifically stated otherwise. Also, the use of “or” means“and/or” unless state otherwise. Similarly, “comprise,” “comprises,”“comprising,” “include,” “includes” and “including” are not intended tobe limiting. It is understood that aspects and embodiments of theinvention described herein include “consisting” and/or “consistingessentially of” aspects and embodiments.

DEFINITIONS

For convenience, certain terms employed in the specification, examples,and appended claims are collected here. Unless defined otherwise, alltechnical and scientific terms used herein have the same meaning ascommonly understood by one of ordinary skill in the art to which thisinvention belongs.

As used herein, the following terms and phrases are intended to have thefollowing meanings:

The articles “a” and “an” are used herein to refer to one or to morethan one (i.e., to at least one) of the grammatical object of thearticle. By way of example, “an element” means one element or more thanone element.

By “a population of hosts” is meant a group of hosts into which alibrary of polynucleotides can be introduced and displayed. The host canbe phages, yeasts, bacteria or mammalian cells. In some embodiments, apopulation of cells from a monoculture, i.e., wherein each cell in thepopulation is of the same cell type can be used. Alternatively, mixedcultures of cells can also be used. Cells may be adherent, i.e., cellswhich grow attached to a solid substrate, or, alternatively, the cellsmay be in suspension. Mammalian cells may be cells derived from primarytumors, cells derived from metastatic tumors, primary cells, cells whichhave lost contact inhibition, transformed primary cells, immortalizedprimary cells, cells which may undergo apoptosis, and cell lines derivedthere from.

As used herein, the term “about” means within 20%, more preferablywithin 10% and most preferably within 5%.

The terms “affinity binder,” “binder,” and “binding protein” are usedinterchangeably to refer to a peptidic chain having a specific orgeneral affinity with another protein or molecule. Proteins are broughtinto contact and form a complex when binding is possible. The affinitybinder of the invention expressed on the surface of a host canpreferably be an antibody, a fragment or derivative of an antibody, aprotein or a peptide. An antibody or a peptide “affinity binds,”“specifically binds” or “preferentially binds” to an antigen or a targetif it binds with greater affinity, avidity, more readily, and/or withgreater duration than it binds to other substances. It is understood byreading this definition that, for example, an antibody or a peptide thatspecifically or preferentially binds to a first target may or may notspecifically or preferentially bind to a second target. As such,“affinity binding,” “specific binding” or “preferential binding” doesnot necessarily require (although it can include) exclusive binding. Insome embodiments, “specific binding” means that the protein exhibitsappreciable affinity for a particular antigen or epitope and, generally,does not exhibit significant cross-reactivity. An antigen binding toprotein that “does not exhibit significant cross-reactivity” is one thatwill not appreciably bind to an entity other than its target (e.g., adifferent epitope or a different molecule). An antigen specific proteinspecific for a particular epitope will, for example, not significantlycross-react with remote epitopes on the same protein or peptide.Specific binding can be determined according to any art-recognized meansfor determining such binding. Preferably, specific binding is determinedaccording to Scatchard analysis and/or competitive binding assays.“Affinity” binding includes binding with an affinity of at least 10⁶,10⁷, 10⁸, 10⁹ M⁻¹, or 10¹⁰ M⁻¹. Antigen binding proteins with affinitiesgreater than 10⁷ M⁻¹ or 10⁸ M⁻¹ typically bind with correspondinglygreater specificity.

As used herein, the term “amino acid sequence” refers to a sequence ofcontiguous amino acid residues of any length. The terms “polypeptide,”“peptide,” “oligopeptide,” or “protein” may be used interchangeablyherein with the term “amino acid sequence.”

An “antibody” is an immunoglobulin molecule capable of specific bindingto a target, such as a carbohydrate, polynucleotide, lipid, polypeptide,etc., through at least one antigen recognition site, located in thevariable region of the immunoglobulin molecule. As used herein, the termencompasses not only intact polyclonal or monoclonal antibodies, butalso fragments thereof (such as Fab, Fab′, F(ab′)₂, Fv), single chain(ScFv), mutants thereof, naturally occurring variants, fusion proteinscomprising an antibody portion with an antigen recognition site of therequired specificity, humanized antibodies, chimeric antibodies, and anyother modified configuration of the immunoglobulin molecule thatcomprises an antigen recognition site of the required specificity.

“Antibody fragments” comprise only a portion of an intact antibody,generally including an antigen binding site of the intact antibody andthus retaining the ability to bind antigen. Examples of antibodyfragments encompassed by the present definition include: (i) the Fabfragment, having VL, CL, VH and CH1 domains; (ii) the Fab′ fragment,which is a Fab fragment having one or more cysteine residues at theC-terminus of the CH1 domain; (iii) the Fd fragment having VH and CH1domains; (iv) the Fd′ fragment having VH and CH1 domains and one or morecysteine residues at the C-terminus of the CH1 domain; (v) the Fvfragment having the VL and VH domains of a single antibody; (vi) the dAbfragment which consists of a VH domain; (vii) isolated CDR regions;(viii) F(ab′)₂ fragments, a bivalent fragment including two Fab′fragments linked by a disulfide bridge at the hinge region; (ix) singlechain antibody molecules (e.g. single chain Fv; scFv); (x) “diabodies”with two antigen binding sites, comprising a heavy chain variable domain(VH) connected to a light chain variable domain (VL) in the samepolypeptide chain; (xi) “linear antibodies” comprising a pair of tandemFd segments (VH-CH1-VH-CH1) which, together with complementary lightchain polypeptides, form a pair of antigen binding regions.

The term “antibody variant” as used herein refers to an antibody withsingle or multiple mutations in the heavy chains and/or light chains. Insome embodiments, the mutations exist in the variable region. In someembodiments, the mutations exist in the constant region.

The term “binding partner” is used interchangeably with “target” torefer to a molecule that is recognized and binds to a peptide orprotein. The binding partner can be an antigen or epitope thereof whenbinding to an antibody or an antibody fragment. Binding partners can beused in a screen to identify affinity binders thereto.

“Biopanning” is an affinity selection technique which selects forpeptides that bind to a given target. Biopanning involves three majorsteps: capturing affinity binders with a target (“panning”) using apreviously constructed library, washing, and elution. Details areprovided hereunder.

A “cell surface tether domain” as used herein, refers to an amino acidsequence that confers the ability of a polypeptide to be associated witha host cell outer membrane, and which is sometimes but not alwaysnaturally present in the protein of interest. As described herein, cellsurface tether domains include, for example, transmembrane domains orglycosidylphosphatidylinositol signal sequences.

“Chimeric antibodies” refers to those antibodies wherein one portion ofeach of the amino acid sequences of heavy and light chains is homologousto corresponding sequences in antibodies derived from a particularspecies or belonging to a particular class, while the remaining segmentof the chains is homologous to corresponding sequences in another.Typically, in these chimeric antibodies, the variable region of bothlight and heavy chains mimics the variable regions of antibodies derivedfrom one species of mammals, while the constant portions are homologousto the sequences in antibodies derived from another. One clear advantageto such chimeric forms is that, for example, the variable regions canconveniently be derived from presently known sources using readilyavailable hybridomas or B cells from non human host organisms incombination with constant regions derived from, for example, human cellpreparations. While the variable region has the advantage of ease ofpreparation, and the specificity is not affected by its source, theconstant region being human, is less likely to elicit an immune responsefrom a human subject when the antibodies are injected than would theconstant region from a non-human source. However, the definition is notlimited to this particular example.

As generally understood, a “codon” is a series of three nucleotides(triplets) that encodes a specific amino acid residue in a polypeptidechain or for the termination of translation (stop codons). There are 64different codons (61 codons encoding for amino acids plus 3 stop codons)but only 20 different translated amino acids. The overabundance in thenumber of codons allows many amino acids to be encoded by more than onecodon. Different organisms (and organelles) often show particularpreferences or biases for one of the several codons that encode the sameamino acid.

A “constant region” of an antibody refers to the constant region of theantibody light chain or the constant region of the antibody heavy chain,either alone or in combination. The constant regions of the light chain(CL) and the heavy chain (CH1, CH2 or CH3, or CH4 in the case of IgM andIgE) confer important biological properties such as secretion,transplacental mobility, Fc receptor binding, complement binding, andthe like. By convention the numbering of the constant region domainsincreases as they become more distal from the antigen binding site oramino-terminus of the antibody.

“Degenerate sequences” are nucleic acid sequences having a length of N1nucleosides and comprises up to 4^(N1) different sequences. In general,a sequence is called “degenerate” if some or all of its positions haveseveral possible bases or substitutions. Assuming Σ={T, C, A, G} is theDNA alphabet, a sequence (e.g. a primer) can be shown as S=x₁x₂ . . .x_(l), where x_(i) ⊂Σ, x_(i)≠Ø and l is the length of S. The degeneracyof a sequence is the number of unique sequence combinations it contains,which can be calculated as d(S)=Π^(l) _(i=1)|x_(i)|. In someembodiments, degenerate codon substitutions may be achieved bygenerating sequences in which the third position of one or more selected(or all) codons is substituted with mixed-base and/or deoxyinosineresidues. The degeneracy can also be selective or constrained where onlya selected subset of positions is subject to substitutions, and/or oneor more or all positions are substituted with constraints by optimizingthe composition and ratio of the mixed-base and/or deoxyinosineresidues.

“Display” refers to presentation of different recombinant polypeptideson the surface of a host such as phages, yeasts, bacteria and mammaliancells.

As used herein, the term “display vector” refers to a plasmid or phageDNA or other DNA sequence which is able to replicate autonomously in ahost, and capable of expressing and displaying an insert in the vectoras part of a fusion protein on the surface of the host.

“Diversity” of a library refers to the number of different recombinantpolypeptides encoded by the polynucleotides in the library.

The term “expression vector” refers to a vector capable of expressing ofa gene or any open reading frame that has been cloned into it. Suchexpression can occur after transformation into a host cell, or in invitro systems. The cloned DNA or insert is usually operably linked toone or more regulatory sequences, such as promoters, activator/repressorbinding sites, terminators, enhancers and the like.

The term “fuse,” “fusion” or “link” refers to the covalent linkagebetween two polypeptides in a single protein. The polypeptides aretypically joined via a peptide bond, either directly to each other orvia an amino acid linker. Optionally, the peptides can be joined vianon-peptide covalent linkages known to those of skill in the art.

As used herein, the term “fusion tag” is a peptide or protein locatedeither on the C- or N-terminal of the target protein, which improves oneor more of: solubility, detection, purification, expression of thetarget protein. The fusion tag is generally engineered in-frame with thetarget protein. Commonly used fusion tags include but are not limitedto: alkaline phosphatase tag, AviTag, cutinase tag, halotag, flag tag,c-myc tag, histidine (His) tag, GST tag, green fluorescent protein (GFP)tag, HA tag, E-tag, Strep tag, Strep tag II and YoI 1/34 tag.

The term “heavy chain” as used herein refers to the largerimmunoglobulin subunit which associates, through its amino terminalregion, with the immunoglobulin light chain. The heavy chain comprises avariable region (VH) and a constant region (CH). The constant regionfurther comprises the CH1, hinge, CH2, and CH3 domains. In the case ofIgE, IgM, and IgY, the heavy chain comprises a CH4 domain but does nothave a hinge domain. Those skilled in the art will appreciate that heavychains are classified as gamma, mu, alpha, delta, or epsilon (γ, μ, α,δ, ε), with some subclasses among them (e.g., γ1-γ4). It is the natureof this chain that determines the “class” of the antibody as IgG, IgM,IgA IgG, or IgE, respectively. The immunoglobulin subclasses (isotypes),e.g., IgG1, IgG2, IgG3, IgG4, IgA1, etc. are well characterized and areknown to confer functional specialization.

A “host” is intended to include any individual virus or cell or culturethereof that can be or has been a recipient for vectors or for theincorporation of exogenous nucleic acid molecules, polynucleotides,and/or proteins. It also is intended to include progeny of a singlevirus or cell. The progeny may not necessarily be completely identical(in morphology or in genomic or total DNA complement) to the originalparent cell due to natural, accidental, or deliberate mutation. Thevirus can be phage. The cells may be prokaryotic or eukaryotic, andinclude but are not limited to bacterial cells, yeast cells, insectcells, animal cells, and mammalian cells, e.g., murine, rat, simian, orhuman cells.

“Humanized” antibodies refer to a molecule having an antigen-bindingsite that is substantially derived from an immunoglobulin from anon-human species and the remaining immunoglobulin structure of themolecule based upon the structure and/or sequence of a humanimmunoglobulin. The antigen-binding site may comprise either completevariable domains fused onto constant domains or only the complementaritydetermining regions (CDRs) grafted onto appropriate framework regions inthe variable domains. Antigen binding sites may be wild type or modifiedby one or more amino acid substitutions, e.g., modified to resemblehuman immunoglobulin more closely. Some forms of humanized antibodiespreserve all CDR sequences (for example, a humanized mouse antibodywhich contains all six CDRs from the mouse antibodies). Other forms ofhumanized antibodies have one or more CDRs (one, two, three, four, five,six) which are altered with respect to the original antibody, which arealso termed one or more CDRs “derived from” one or more CDRs.

A gene or reading frame is “in-frame” with another when the two genes orreading frames are cloned together to generate a contiguous readingframe, without disrupting the codons therein, such that when expressed,a fusion protein expressing both genes or reading frames are produced.Generally the upstream gene or reading frame is an open reading frame. Areading frame is a way of dividing the sequence of nucleotides in anucleic acid into a set of consecutive, non-overlapping codons. An openreading frame (ORF) is a reading frame that contains a start codon, anda subsequent region which usually has a length which is a multiple of 3nucleotides, but does not contain a stop codon in a given reading frame.

An “insert” as used herein, is a heterologous nucleic acid sequence thatis ligated into a compatible site into a vector. An insert may compriseone or more nucleic acid sequences that encode a polypeptide orpolypeptides. An insert may comprise regulatory regions or other nucleicacid elements.

An “isolated” or “purified” polypeptide or polynucleotide, e.g., an“isolated polypeptide,” or an “isolated polynucleotide” is purified to astate beyond that in which it exists in nature. For example, the“isolated” or “purified” polypeptide or polynucleotide, can besubstantially free of cellular material or other contaminating proteinsfrom the cell or tissue source from which the protein or polynucleotideis derived, or substantially free from chemical precursors or otherchemicals when chemically synthesized. The preparation of antigenbinding protein having less than about 50% of non-antigen bindingprotein (also referred to herein as a “contaminating protein”), or ofchemical precursors, is considered to be “substantially free.” 40%, 30%,20%, 10% and more preferably 5% (by dry weight), of non-antigen bindingprotein, or of chemical precursors is considered to be substantiallyfree.

“Library” used herein refers to a diverse collection or mixture ofpolynucleotides comprising polynucleotides encoding differentrecombinant polypeptides. In certain embodiments, a library ofpolynucleotides may comprise at least 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹,10¹², 10¹³, 10¹⁴, 10¹⁵, or more or less different polynucleotides withina given collection of polynucleotides. Typically, the differentpolynucleotides in the library are related through, for example, theirorigin from a single animal species (for example, human, mouse, rabbit,goat, horse), tissue type, organ, or cell type. A “library” may comprisepolynucleotides of a common genus. For example, the genus can bepolynucleotides encoding an immunoglobulin subunit polypeptide of acertain type and class, e.g., a library might encode an antibody μ, γ1,γ2, γ3, γ4, α1, α2, δ, or ε heavy chain, or an antibody κ or λ lightchain. Although each member of any one library described herein mayencode the same heavy or light chain constant region, the library maycollectively comprise at least 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹²,10¹³, 10¹⁴, 10¹⁵ or more or less different variable regions, i.e., a“plurality” of variable regions associated with the common constantregion. The different polynucleotides in the library can also be relatedthrough, for example, their origin from a single animal species (forexample, human, mouse, rabbit, goat, horse, etc.), tissue type, organ,or cell type.

The term “light chain” as used herein refers to the smallerimmunoglobulin subunit which associates with the amino terminal regionof a heavy chain. As with a heavy chain, a light chain comprises avariable region (VL) and a constant region (CL). Light chains areclassified as either kappa or lambda (κ, λ). A pair of these canassociate with a pair of any of the various heavy chains to form animmunoglobulin molecule. Also encompassed in the meaning of light chainare light chains with a lambda variable region (V-lambda) linked to akappa constant region (C-kappa) or a kappa variable region (V-kappa)linked to a lambda constant region (C-lambda).

The terms “marker” or “reporter” refer to a gene or protein that can beattached to a regulatory sequence of another gene or protein ofinterest, so that upon expression in a host cell or organism, thereporter can confer certain characteristics that can be relativelyeasily selected, identified and/or measured. Reporter genes are oftenused as an indication of whether a certain gene has been introduced intoor expressed in the host cell or organism. Examples of commonly usedreporters include: antibiotic resistance genes, auxotropic markers,β-galactosidase (encoded by the bacterial gene lacZ), luciferase (fromlightning bugs), chloramphenicol acetyltransferase (CAT; from bacteria),GUS (β-glucuronidase; commonly used in plants) and green fluorescentprotein (GFP; from jelly fish). Reporters or markers can be selectableor screenable. A selectable marker (e.g., antibiotic resistance gene,auxotropic marker) is a gene confers a trait suitable for artificialselection; typically host cells expressing the selectable marker isprotected from a selective agent that is toxic or inhibitory to cellgrowth. A screenable marker (e.g., gfp, lacZ) generally allowsresearchers to distinguish between wanted cells (expressing the marker)and unwanted cells (not expressing the marker or expressing atinsufficient level).

“Nucleic acid,” “nucleic acid sequence,” “oligonucleotide,”“polynucleotide” or other grammatical equivalents as used herein meansat least two nucleotides, either deoxyribonucleotides orribonucleotides, or analogs thereof, covalently linked together.Polynucleotides are polymers of any length, including, e.g., 20, 50,100, 200, 300, 500, 1000, 2000, 3000, 5000, 7000, 10,000, etc. Apolynucleotide described herein generally contains phosphodiester bonds,although in some cases, nucleic acid analogs are included that may haveat least one different linkage, e.g., phosphoramidate, phosphorothioate,phosphorodithioate, or O-methylphophoroamidite linkages, and peptidenucleic acid backbones and linkages. Mixtures of naturally occurringpolynucleotides and analogs can be made; alternatively, mixtures ofdifferent polynucleotide analogs, and mixtures of naturally occurringpolynucleotides and analogs may be made. The following are non-limitingexamples of polynucleotides: a gene or gene fragment, exons, introns,messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA,cRNA, recombinant polynucleotides, branched polynucleotides, plasmids,vectors, isolated DNA of any sequence, isolated RNA of any sequence,nucleic acid probes, and primers. A polynucleotide may comprise modifiednucleotides, such as methylated nucleotides and nucleotide analogs. Ifpresent, modifications to the nucleotide structure may be impartedbefore or after assembly of the polymer. The sequence of nucleotides maybe interrupted by non-nucleotide components. A polynucleotide may befurther modified after polymerization, such as by conjugation with alabeling component. The term also includes both double- andsingle-stranded molecules. Unless otherwise specified or required, anyembodiment of this invention that is a polynucleotide encompasses boththe double-stranded form and each of two complementary single-strandedforms known or predicted to make up the double-stranded form. Apolynucleotide is composed of a specific sequence of four nucleotidebases: adenine (A), cytosine (C), guanine (G), thymine (T), and uracil(U) for thymine when the polynucleotide is RNA. Thus, the term“polynucleotide sequence” is the alphabetical representation of apolynucleotide molecule. Unless otherwise indicated, a particularpolynucleotide sequence also implicitly encompasses conservativelymodified variants thereof (e.g., degenerate codon substitutions) andcomplementary sequences as well as the sequence explicitly indicated.Specifically, degenerate codon substitutions may be achieved bygenerating sequences in which the third position of one or more selected(or all) codons is substituted with mixed-base and/or deoxyinosineresidues.

“Operably linked” refers to a juxtaposition of two or more components,wherein the components so described are in a relationship permittingthem to function in their intended manner. For example, a promoterand/or enhancer is operably linked to a coding sequence if it acts incis to control or modulate the transcription of the linked sequence.Generally, but not necessarily, the DNA sequences that are “operablylinked” are contiguous and, where necessary to join two protein codingregions or in the case of a secretory leader, contiguous and in frame.However, although an operably linked promoter is generally locatedupstream of the coding sequence, it is not necessarily contiguous withit. A polyadenylation site is operably linked to a coding sequence if itis located at the downstream end of the coding sequence such thattranscription proceeds through the coding sequence into thepolyadenylation sequence. Linking is accomplished by recombinant methodsknown in the art, e.g., using PCR methodology, by annealing, or byligation at convenient restriction sites. If convenient restrictionsites do not exist, then synthetic oligonucleotide adaptors or linkersare used in accord with conventional practice.

The terms “peptide,” “polypeptide” and “protein” used herein refer topolymers of amino acid residues. These terms also apply to amino acidpolymers in which one or more amino acid residues is an artificialchemical mimetic of a corresponding naturally occurring amino acid, aswell as to naturally occurring amino acid polymers, those containingmodified residues, and non-naturally occurring amino acid polymers. Inthe present case, the term “polypeptide” encompasses an antibody or afragment thereof.

By the term “recombinant” polynucleotide or nucleic acid herein is meanta polynucleotide or nucleic acid not normally found in its naturalenvironment, e.g., any nucleic acid comprising at least two sequencesthat are not present together in nature. A recombinant nucleic acid maybe generated in vitro, for example by using the methods of molecularbiology, or in vivo, for example by insertion of a nucleic acid at anovel chromosomal location by homologous or non-homologousrecombination. It is understood that once a recombinant polynucleotideis made and reintroduced into a host cell or organism, it will replicatenon-recombinantly, e.g., using the in vivo cellular machinery of thehost cell rather than in vitro manipulations; however, such nucleicacids, once produced recombinantly, although subsequently replicatednon-recombinantly, are still considered recombinant for the purposes ofthe invention.

“Recovering” is used herein to mean a crude separation of a desiredspecies from the rest of the pool which are not desired.

“Restriction endonucleases” or “restriction enzymes” are enzymes thatbind to a double-stranded nucleic acid (e.g., DNA) at one site, referredto as the recognition site, and make a single double stranded cutoutside of the recognition site. The double stranded cut, referred to asthe restriction site or cleavage site, is generally situated within orat short distances away from the recognition site. Some restrictionsites occur frequently in DNA (e.g., every several hundred base pairs,others much less frequently (rare-cutter; e.g., every 10,000 basepairs). The recognition site is generally about 4-8 bp long. Cleavagegenerally produces 1-6 nucleotide single-stranded overhangs, with 5′ or3′ termini, although some enzymes produce blunt ends. Such enzymes andinformation regarding their recognition and cleavage sites are availablefrom commercial suppliers such as New England Biolabs. As used herein,restriction sites are “compatible” if, once cleaved by appropriaterestriction enzymes, can be ligated by a DNA ligase. In someembodiments, the compatible restriction sites include thosedouble-stranded sequences that, once cleaved by appropriate restrictionenzymes, generate “sticky ends” with complementary overhang sequencesthat can be joined by a DNA ligase.

“Screening” used herein refers to the method in which a pool comprisingthe desired species is subject to an assay in which the desired speciescan be detected, and subsequently an aliquot of the pool in which thedesired species is detected and optionally enriched is recovered orobtained.

“Surface protein” or “surface peptide” refers to an amino acid sequencethat confers the ability of a polypeptide to be associated with and/orpresented on the surface of a host, such as phages, yeasts, bacteria andmammalian cells. The surface protein can be a phage coat protein, ayeast outer wall protein, a bacterial outer membrane protein, a cellsurface tether domain, or an adapter, or a truncation or derivativethereof. Surface proteins can be used for phage display, yeast display,bacterial display or mammalian display, or shuttling display betweendifferent hosts.

As used herein, unless otherwise stated, the term “transcription” refersto the synthesis of RNA from a DNA template; the term “translation”refers to the synthesis of a polypeptide from an mRNA template.Transcription and translation collectively are known as “expression.”

The term “transfected” or “transformed” or “transduced” as used hereinrefers to a process by which exogenous nucleic acid is transferred orintroduced into the host cell. A transformed cell includes the primarysubject cell and its progeny. The host cell can be bacteria, yeasts,mammalian cells, and plant cells.

A “variable region” of an antibody refers to the variable region of theantibody light chain or the variable region of the antibody heavy chain,either alone or in combination. The variable regions of both the light(VL) and heavy (VH) chain portions determine antigen recognition andspecificity. VL and VH each consist of four framework regions (FR)connected by three complementarity determining regions (CDRs) also knownas hypervariable regions. The CDRs complement an antigen's shape anddetermine the antibody's affinity and specificity for the antigen. Thereare six CDRs in both VL and VII, The CDRs in each chain are heldtogether in close proximity by the FRs and, with the CDRs from the otherchain, contribute to the formation of the antigen-binding site ofantibodies. There are at least two techniques for determining CDRs: (1)an approach based on cross-species sequence variability (the Kabatnumbering scheme; see Kabat et al., Sequences of Proteins ofImmunological Interest (5th ed., 1991, National Institutes of Health,Bethesda Md.)); and (2) an approach based on crystallographic studies ofantigen-antibody complexes (the Chothia numbering scheme which correctsthe sites of insertions and deletions (indels) in CDR-L1 and CDR-H1suggested by Kabat; see Al-lazikani et al. (1997) J. Molec. Biol.273:927-948)). Other numbering approach or scheme can also be used. Asused herein, a CDR may refer to CDRs defined by either approach or by acombination of both approaches or by other desirable approaches. Inaddition, a new definition of highly conserved core, boundary andhyper-variable regions can be used.

As used herein, the term “vector” refers to a nucleic acid moleculecapable of transporting another nucleic acid to which it has beenlinked. A vector includes any genetic element, such as a plasmid, phagevector, phagemid, transposon, cosmid, chromosome, artificial chromosome,episome, virus, virion, etc., capable of replication when associatedwith the proper control elements and which can transfer gene sequencesinto or between hosts. One type of vector is an episome, i.e., a nucleicacid capable of extra-chromosomal replication. Another type of vector isan integrative vector that is designed to recombine with the geneticmaterial of a host cell. Vectors may be both autonomously replicatingand integrative, and the properties of a vector may differ depending onthe cellular context (i.e., a vector may be autonomously replicating inone host cell type and purely integrative in another host cell type).Vectors generally contain one or a small number of restrictionendonuclease recognition sites and/or sites for site-specificrecombination. A foreign DNA fragment may be cleaved and ligated intothe vector at these sites. The vector may contain a marker suitable foruse in the identification of transformed or transfected cells. Forexample, markers may provide antibiotic resistant, fluorescent,enzymatic, as well as other traits. As a second example, markers maycomplement auxotrophic deficiencies or supply critical nutrients not inthe culture media.

Other terms used in the fields of recombinant nucleic acid technology,microbiology, immunology, antibody engineering, and molecular and cellbiology as used herein will be generally understood by one of ordinaryskill in the applicable arts.

Display and Selection

Display technology is used to present different recombinant polypeptideson the surface of a host such as phages, yeasts, bacteria and mammaliancells. Various display vectors are used to express and display an insertin the vector as part of a fusion protein on the surface of the host. Byexposing a plurality of such fusion proteins to a target, various invitro selection processes generally known in the art can then be used toidentify affinity binders to the target.

One commonly selection process is known as biopanning. The first step ofbiopanning is to construct display libraries. The diversity or size ofthe library can be 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴,10¹⁵, or more or less different members (e.g., polynucleotides). Libraryconstruction involves inserting desired genes or reading frames intodisplay vectors described herein. The inserts can encode an antibodyfragment. For example, the inserts can be a plurality of randommutations or degenerate oligonucleotides designed to introducevariations/mutations into an antibody fragment (for example, each CDRregion). The inserts can also be flanked by two restriction sites tofacilitate cloning. The restriction sites can be selected to achievehigh efficiency cloning (e.g., above about 80%, above about 85%, aboveabout 90%, or above about 9%).

In some embodiments, the display library can be constructed using avector containing a recombinant polynucleotide of the present invention.The recombinant polynucleotide can include from 5′ to 3′: a firstnucleic acid sequence (or insert) encoding an amino acid or polypeptidesequence to be displayed on a surface; a first pre-selected restrictionsite; a second nucleic acid sequence encoding a surface protein fordisplay; and a second pre-selected restriction site. These componentscan be operably linked together and to the vector. In some embodiments,when cleaved by corresponding restriction endonuclease thereto, thefirst and second restriction sites produce compatible sticky ends.

In certain embodiments, the first and second restriction site can beselected to each encode amino acids that do not interfere with bindingaffinity of the amino acid or polypeptide sequence to a target and/ordisplay of the surface protein. Where the first nucleic acid sequenceencodes an antibody fragment, the first and second restriction sites canbe selected according to one or more of the following criteria:

-   -   (1) The restriction site is rarely present in human germline Vκ,        Vλ or VH;    -   (2) The restriction site is not present in the library (e.g.,        the six CDR regions);    -   (3) After cleavage, a sticky end (e.g., 4-base sticky end) is        exposed. In some instances, a 4-base sticky end is expected to        increase self-ligation efficiency than other sticky ends (e.g.,        2-base sticky end) or blunt ends;    -   (4) The restriction site does not encode any in-frame stop        codons, and it encodes amino acids that are compatible with the        structure of antibody fragment (for example, no Cys is allowed);    -   (5) The sites are unique in the vector;    -   (6) The corresponding restriction enzyme can be inactivated        (e.g., at higher temperature such as 65° C.); and/or    -   (7) The corresponding the restriction enzyme is inexpensive yet        robust, and lacks star activity.

For example, the first and second restriction site can be selected tomeet all of the criteria above. Such sites, in the case of phagemidFad22 exemplified herein, include XbaI, NcoI, SalI and XhoI sites. Inone embodiment, the first restriction site is XbaI site. In someembodiments, the second restriction site is XbaI site. For othervectors, one of ordinary skill in the art would be able to follow thecriteria above to select appropriate restriction sites.

In certain embodiments, the first nucleic acid sequence can be an openreading frame and can be engineered in-frame with the second nucleicacid sequence such that a fusion protein expressing both sequences canbe produced upon expression. Such fusion protein can then be presentedon a surface of a host. To facilitate surface presentation, the secondnucleic acid sequence can be selected to encode a surface protein suchas a phage coat protein, a yeast outer wall protein, a bacterial outermembrane protein, or a cell surface tether domain, or a truncation orderivative thereof. For example, gene III of filamentous phage M13, or atruncation or derivative thereof, can be used. Accordingly, the surfaceprotein enables phage display, yeast display, bacterial display ormammalian display, or shuttling display therebetween.

In various embodiments, a display vector for high-throughput conversioninto expression vector is provided. The display vector can contain therecombinant polynucleotide described above, as well as a fusion tagsequence 3′ to the second restriction site. In some embodiments, thefusion tag sequence can be engineered such that upon (a) removal of thesecond nucleic acid sequence by cleaving the first and secondrestriction site, and (b) religation of the compatible sticky endsproduced therefrom, the first nucleic acid sequence is in-frame with thefusion tag sequence. Exemplary fusion tag sequence include one or moreof: an alkaline phosphatase tag, an AviTag, a cutinase tag, a halotag, aflag tag, a c-myc tag, a histidine tag, a GST tag, a green fluorescentprotein tag, an HA tag, an E-tag, a Strep tag, a Strep tag II, a YoI1/34 tag, and other tags known by one of ordinary skill in the art.Fusion tag sequences can be operably linked to other elements in thevector. In some embodiments, specific restriction sites can be selectedto flank the fusion tag sequences, to facilitate exchange and/orcloning. For example, as shown in FIG. 1, a FlagHis6 tag is used in theFad22 phagemid. The FlaHis6 tag is flanked by a SalI site (e.g.,upstream) and an XbaI site (e.g., downstream). The sites can beengineered through silent mutagenesis. In addition, the SalI site isplaced immediately downstream of and is the closest restriction site tothe C-terminus of the CH1 domain. These sites can facilitate theconvenient exchange of tags downstream of CH1. Many other restrictionsites, in addition to SalI or in replacement of SalI, can also beintroduced downstream and outside of the CH1 ORF to facilitateintroduction and removal of various vector elements.

During display library construction, a library of display vectors can beused, in which each display vector can contain a unique first nucleicacid sequence that encode a unique peptide or antibody fragment.

Once a library is constructed, the next step of biopanning is thecapturing or panning step where the displayed peptide binds to a desiredtarget. Panning utilizes the binding interactions between an affinitybinder presented by the host with its target, so that only specificpeptides that have a binding affinity are bound to the target. Thetarget is usually attached to or fixed in a solid phase, either directlyor indirectly. For example, antibodies presented by bacteriophage can beselected with coated antigen in microtiter plates. After the capturingstep, a mild washing step is generally used to wash away the unboundhosts that do not present affinity binders on their surfaces; only thebound hosts with desired affinity remain. The final step involves anelution or recovering step where the bound hosts are eluted throughprotease cleavage, changing of pH or other environmental conditions. Theend result is that specific peptides displayed on the bound hosts arecollected and analyzed. The cycle can occur many times (e.g., withincreasingly stringent wash conditions) to screen for strong affinitybinders to the target.

Other in vitro selection processes can also be used. For example,various markers or reporters can facilitate selection. One commonly usedreporter is the GFP protein, where affinity binders can be selectedthrough fluorescence-activated cell sorting (FACS).

In some embodiments, after an affinity binder is recovered, thecorresponding polynucleotide needs to be expressed, sequenced, and/orfurther characterized. Using the vectors of the present invention,display vectors can be converted to expression vectors in a simplecleavage-religation step, eliminating the need to subclone thepolynucleotide of interest. Thus, the present invention provides anattractive alternative for more efficient and faster downstreamevaluation of affinity binders.

Conversion Vector in Phage Display

In the sections below, a detailed description is provided with regard tophagemid vectors. It should be understood by a person or ordinary skillin the art, however, that the description is equally applicable to othertypes of vectors using standard molecular cloning techniques. Forexample, discussions about the phage (e.g., M13 phage) coat proteins(e.g., GP3, GP8 or GP7) are equally applicable to the other surfaceproteins such as yeast outer wall proteins, bacterial outer membraneproteins, cell surface tether domains or adapters. It should also beunderstood that description below are applicable to various antibodiesand antibody fragments (including, for example, Fab, Fab′ Fd, Fd′, Fv,dAb, isolated CDR regions, F(ab′)₂, and scFv). Furthermore, the antibodyfragments identified using the vectors and methods of the presentinvention can be used to construct various forms of antibodies, such ashumanized antibodies, chimeric antibodies, and any other modifiedconfiguration of the immunoglobulin molecule that comprises an antigenrecognition site of the required specificity.

Similarly, although the description provided below focuses on isolationof polynucleotide encoding an antibody or antibody fragment specificallyrecognizing an antigen, the methods are equally applicable to isolationof polynucleotides encoding a polypeptide (such as an antibody) withother desired properties, which include, for example, specific bindingto a partner, higher binding affinity to a binding partner,antibody-dependent cellular cytotoxicity (ADCC), complement-dependentcytotoxicity (CDC), agonist or antagonist functions, induction orinhibition of apoptosis, angiogenesis, proliferation, activation ofinhibition of signaling pathway. Multiple properties may be screenedsimultaneously or individually. Assay methods for these desiredproperties are known in the art.

The phage display method is a technology used for studying interactionsof proteins. This technology is based on expressing on the surface of aphage (display) the binding protein of interest and selecting saidbinding protein on its capacity to form a complex with a bindingpartner. In general, there are two types of vectors for displaying thebinding proteins, phage vector and phagemid vector, resulting inmultivalent phage or monovalent phage, respectively.

In the first method involving phage vector, the principle relies on thegenetic recombination of the phage genome: a sequence encoding a bindingprotein of interest is inserted into said phage genome. The sequenceinsertion is localized next to a gene encoding a protein forming thecoat protein complex of the phage. Said coat is composed of differentproteins, such as GP3 and GP8 proteins which are the most commonly used.The insertion of a sequence of interest next to the gene encoding thesecoat proteins enables the fusion of the binding protein of interest tothe coat protein of the phage. The recombinant phage (i.e., a phagevector) then infects bacteria, and its genome is replicated therein. Theexpression of the recombinant phage genome leads to the production ofphages expressing on their surface the heterologous binding protein tobe screened, in which all copies of the coat protein display theheterologous protein (i.e., the heterologous protein is displayed in themultiple on a given phage, or multivalently). During the steps ofscreening, different proteins or molecules, referred to as targets orbinding partners, are brought into contact with said protein ofinterest. When a complex is formed between the binding protein on thesurface of the phage and a binding partner, the complex is purified andthe nucleotidic sequence encoding the binding protein of interest canthen be determined from the recombinant phage genome.

Exemplary phage vectors include M13IX30, M13IX11, M13IX34, M13IX13 orM13IX60 described in International Patent Publication No. WO92/06204,and any derived phage vectors such as the vector 668-4 used in a methodfor generating multivalent display library described in the U.S. Pat.No. 6,057,098. Although most phage display methods have used filamentousphage, lambdoid phage display systems (WO 95/34683; U.S. Pat. No.5,627,024), T4 phage display systems (Ren et al., Gene, 215: 439 (1998);Zhu et al., Cancer Research, 58(15): 3209-3214 (1998); Jiang et al.,Infection & Immunity, 65(11): 4770-4777 (1997); Ren et al., Gene,195(2):303-311 (1997); Ren, Protein Sci., 5: 1833 (1996); Efimov et al.,Virus Genes, 10: 173 (1995)) and T7 phage display systems (Smith andScott, Methods in Enzymology, 217: 228-257 (1993); U.S. Pat. No.5,766,905) are also known.

The second phage display method combines the use of phagemid (display)vectors and helper phages for the generation of libraries of phagesexpressing on their surface a binding protein of interest. Phagemidvectors comprise the sequence encoding the binding protein of interestand phage sequences, especially the sequence encoding the coat proteinto be fused with the binding protein of interest (i.e., the displayedprotein fusion gene), which is cloned into a small plasmid. Phagemidvectors are also constituted of different functional sequences for thereplication of the phage genome (e.g., phage replication origin such asFf or f1 origin) and the maintenance of the vector in the host cell(e.g., plasmid replication origin such as pBR322 or pMB1). The phagemidvectors do not contain the whole phage genome, which is why this methodis combined with the use of a helper phage for the production of phagesexpressing binding proteins as a fusion on their surface. The helperphage, upon infection of the host cell (e.g., E. coli), enables thereplication and packaging of the phage genome by complementing proteinsfrom the complete phage genome that is absent in the phagemid.Commercially available helper phage includes M13K07 from New EnglandBiolabs and R408 and VCSM13 from Stratagene, which typically havemutations that reduce packaging efficiency, to ensure phagemid genomesare preferentially packaged. In addition, engineered helper phage withcomplement adapter or adapters to prevent contamination from unrelatedphage or phagmid particles. As a result of the use of helper phages, allwide-type phage proteins from the helper phage genome, as well as asmall amount of the fusion protein encoded by the phagemid areexpressed, so that phage particles extruded by the cells contain bothproteins, usually with the wild-type in considerable excess. A typicalpreparation of phage particles from E. coli harboring a GP3 phagemiddisplay vector and infected with helper phage will exhibit a Poissondistribution of fusion protein expression: 10% or less of the particleswill display one copy of the fusion protein; a very small percentagewill display two copies; and the remaining majority of the particleswill display only wild-type GP3 and no fusion protein. Thus, the maindisplaying species is monovalent, while most particles do not display atall.

The valency of display is important primarily due to its impact on theability to discriminate binders of differing affinities. Early workshowed that multivalent display prevented the highest-affinity clones ina selection form being identified, because multivalency conferred a highapparent affinity (avidity) on weakObinding clones. Monovalent displayallows selection based on pure affinity, and is therefore generallypreferred for the many studies where the aim is to identify the tightestbinding variants from a library. Conversely, in applications where theinitial selectants are of law affinity for example, the de novoselection of peptides that bind a given target multivalency increasesthe changes of isolating rare and weakly binding clones. A frequentlyused strategy is to start with multivalent display, and then move tomonovalent display as the affinity of the displayed peptides matures.

The use of a helper phage can be eliminated by using the bacterialpackaging cell line technology, which is described in detail in U.S.Pat. No. 8,227,242, the entire disclosure incorporated herein byreference.

In addition to selecting suitable vectors for constructing a library ofmutants, sorting phage libraries of mutants also requires a strategy forconstructing and propagating a large number of variants, a procedure foraffinity purification using the target receptor, and a means ofevaluating the results of binding enrichments. See U.S. Pat. Nos.5,223,409; 5,403,484; 5,571,689; and 5,663,143.

Many other improvements and variations of the basic phage displayconcept have now been developed. These improvements enhance the abilityof display systems to screen peptide libraries for binding to selectedtarget molecules and to display functional proteins with the potentialof screening these proteins for desired properties. Combinatorialreaction devices for phage display reactions have been developed (WO98/14277) and phage display libraries have been used to analyze andcontrol bimolecular interactions (WO 98/20169; WO 98/20159) andproperties of constrained helical peptides (WO 98/20036). WO 97/35196describes a method of isolating an affinity ligand in which a phagedisplay library is contacted with one solution in which the ligand willbind to a target molecule and a second solution in which the affinityligand will not bind to the target molecule, to selectively isolatebinding ligands. WO 97/46251 describes a method of biopanning a randomphage display library with an affinity purified antibody and thenisolating binding phage, followed by a micropanning process usingmicroplate wells to isolate high affinity binding phage. WO 97/47314describes the use of substrate subtraction libraries to distinguishenzyme specificities using a combinatorial library which may be a phagedisplay library. A method for selecting enzymes suitable for use indetergents using phage display is described in WO 97/09446. Additionalmethods of selecting specific binding proteins are described in U.S.Pat. Nos. 5,498,538 and 5,432,018, and WO 98/15833. In addition, adaptordirected display and cross-species display, including but not limited tothose described in Wang et al, Adapter-Directed Display: A ModularDesign for Shuttling Display on Phage Surfaces, J. Mol. Biol (2010) 395,1088-1101 and Wang et al., Yeast surface display of antibodies via theheterodimeric interaction of two coiled-coil adapters, Journal ofImmunological Methods (2010) 354, 11-19, can also be used in the presentinvention.

Methods of generating peptide libraries and screening these librariesare also disclosed in U.S. Pat. Nos. 5,723,286; 5,432,018; 5,580,717;5,427,908; 5,498,530; 5,770,434; 5,734,018; 5,698,426; 5,763,192; and5,723,323.

Typically, phage display is achieved through fusions of an antibodyfragment to one of the coat proteins on M13 phage, such as GP3, GP8, orGP7. In one embodiment of the present invention, a phagemid named Fad22is constructed, where the GP3 C-terminal (“CT”) fragment is used insteadof full-length GP3 protein. The GP3 CT fragment has been shown to beless toxic than full-length GP3, while equally efficient in displayingantibody fragment on phage surface.

Using recombinant polynucleotides and vectors described herein, phagedisplay libraries can be constructed. For example, restriction sites canbe selected to flank the GP3 CT fragment. Such sites, in the case ofphagemid Fad22 exemplified herein, include XbaI, NcoI, SalI and XhoIsites. In one embodiment, the first restriction site is XbaI site. Insome embodiments, the second restriction site is XbaI site.

A desired affinity binder (e.g., an antibody fragment) to a target canbe identified through screening and recovering steps such as thebiopanning process described above. Next, it is often necessary toexpress the antibody fragment without the GP3 CT fragment in a host(e.g., E. coli) in order to further purify and characterize the antibodyfragment. There are generally two approaches to achieve this goal.

The first approach is a genetic approach, in which an amber stop codon(TAG) is inserted between the gene encoding the antibody fragment andthat encoding the GP3 fragment. In a bacterial suppressor strain (suchas TG1 or ER2738) used for phage screening, a genetic allele SupEsuppresses the amber stop codon and insert an amino acid (usually aglutamine) instead, thus allowing surface display of the antibodyfragment through the GP3 fragment. In a non-suppressing bacterial strain(such as HB2151) commonly used to for expression, however, the bacterialtranslation machinery ribosome stops at the amber stop codon between theantibody fragment and the GP3 fragment, which results in the synthesisand export to bacterial periplasm of the antibody fragment, but not itsfusion with GP3 fragment. The advantage of this approach is that nofurther manipulation of display vector is required. Instead, theantibody fragment can be expressed in E. coli by simply moving the samedisplay vector from an amber-suppressing strain to a non-suppressingstrain. However, this approach also has several disadvantages: (a) Thesuppression efficiency is strain dependent, and usually is less than30%, which reduces phage display efficiency; (b) the amber stop codon isnot a strong termination codon and thus, there is leaky read-througheven in a non-suppressing strain, resulting in lower yield of solubleantibody fragments in bacterial periplasm; and (c) because antibodyfragments containing internal amber stop codons are also displayed andselected in suppressing strains, these internal amber stop codons mustbe replaced with other codons through a laborious and expensivesite-directed mutagenesis process, in order for these antibody fragmentsto be expressed in non-suppressing strains.

The second “cleavage-religation” approach differs from theabove-mentioned approach in that, after the display vector is extractedfrom E. coli, the vector is digested with a specific restriction enzymeto release the GP3 fragment, and then the vector is self-ligated usingT4 DNA ligase, followed by transformation into specific competent E.coli strains for expression. Pershad et al. (Anal. Biochem. 412 (2011):210-216) used MfeI, an infrequently used and relatively expensiverestriction enzyme to excise GP3 coding sequence from phage displayvector. MfeI enzyme is also known to display star activity (NEB), andits High Fidelity version MfeI-HF reduces but does not eliminate thestar activity. In addition, MfeI restriction site, CAATTG encodes Glnand Leu, and it is known that Gln is susceptible to posttranslationalmodifications such as deamidation, which may adversely affect thestructure and function of the affinity binder.

In contrast to MfeI, the vector of the present invention utilizesrestriction site having one or more of the following advantages:

-   -   (1) The restriction site is rarely present in human germline Vκ,        Vλ or VH;    -   (2) The restriction site is not present in the library (e.g.,        the six CDR regions);    -   (3) The restriction site encodes amino acids that are compatible        with the structure of affinity binder such as antibody fragment;    -   (4) The sites are unique in the vector; and/or    -   (5) The corresponding the restriction enzyme is inexpensive yet        robust, and lacks star activity.

Using recombinant polynucleotides and vectors of the present invention,display vectors can be converted to expression vectors in ahigh-throughput, high-efficiency fashion. In some embodiments, a methodof converting a display vector to an expression vector. Such methodincludes: providing the display vector described herein; cleaving thefirst and second restriction site with corresponding restrictionendonuclease thereto, thereby producing compatible sticky ends; andreligating the compatible sticky ends to produce an expression vector inwhich the first nucleic acid sequence is in-frame with the fusion tagsequence.

The method can further include additionally cleaving within the secondnucleic acid sequence to increase religation efficiency in thereligating step. Such cleaving may produce blunt ends that are lesslikely to ligate than sticky ends.

In some embodiments, the product from the cleaving step can be dilutedbefore the religating step, to increase religation efficiency. Thedilution can be with water, e.g., 1:10 or 1:20 or 1:100 with water, tofacilitate intramolecular interaction (i.e., vector religation) overintermolecular interactions (i.e., ligation between the vector and theGP3 fragment).

After the religating step, the expression vector can be introduced intoa host, to further characterize the first nucleic acid sequence,including expressing a corresponding peptide in the host or subjectingthe first nucleic acid sequence to sequencing. In some embodiments, theproviding, cleaving, religating and introducing steps can be completedin less than 12 hours, less than 8 hours, or less than 4 hours.

In the context of library screening and selection, a plurality of (e.g.,hundreds or thousands) display vectors can be converted to a pluralityof expression vectors in parallel, and the plurality of expressionvectors can be introduced into a population of hosts. Thus, methods ofthe present invention can be performed in a high-throughput fashion. Insome embodiments, the conversion rate from the plurality of displayvectors to the plurality of expression vectors can be higher than 80%,higher than 85%, higher than 90%, higher than 92%, higher than 95% orhigher than 98%. Because of the near 100% conversion rate, parallelprocessing of hundreds or thousands or millions of individual coloniescan be achieved (e.g., in individual 96-well plates), eliminating theneed to further characterize each single colony and enhancing throughputdramatically.

A method of identifying an affinity binder to a target is additionallyprovided by the present invention. The method can include screening apopulation of first hosts each containing the display vector describedherein, to obtain a subpopulation of first hosts having binding affinityto a target. Each first host displays, on a surface thereof, an aminoacid sequence encoded by a unique first nucleic acid sequence in thedisplay vector. The subpopulation of first hosts each display anaffinity binder to the target. The method also includes convertingdisplay vectors isolated from the subpopulation of first hosts toexpression vectors by cleaving the first and second restriction site toremove the second nucleic acid sequence and religating the compatiblesticky ends produced therefrom. The method additionally includesintroducing the expression vectors into a population of second hosts, tofurther characterize the affinity binder. In various embodiments, themethod is performed in a high-throughput fashion. For example, theconversion rate of the converting step can be higher than 80%, higherthan 85%, higher than 90%, higher than 92%, higher than 95% or higherthan 98%.

Various aspects of the present invention may be used alone, incombination, or in a variety of arrangements not specifically discussedin the embodiments described in the foregoing and is therefore notlimited in its application to the details and arrangement of componentsset forth in the foregoing description or illustrated in the drawings.For example, aspects described in one embodiment may be combined in anymanner with aspects described in other embodiments.

Examples

In an exemplary design of Fad22 (FIG. 1), two XbaI restriction siteswere engineered to flank the GP3 C-terminal fragment (labeled “GP3CT”).One XbaI site (XbaI (4116)) is placed upstream of the GGT codon 253(encoding Gly) of GP3, while the other XbaI site (XbaI (3645)) ispositioned downstream of two stop codons (TGA TAA) placed after the TCTcodon 406 (encoding Ser) of GP3. The XbaI restriction site is chosenbecause it fulfills the following criteria:

-   -   (1) It is rarely present in human germline Vκ, Vλ or VH;    -   (2) It is not present in the library (e.g., the six CDR        regions);    -   (3) After cleavage, a 4-base sticky end, instead of a 2-base        sticky end or a blunt end, is exposed. A 4-base sticky end in        some embodiments can increase self-ligation efficiency;    -   (4) The XbaI site (TCTAGA) encodes amino acid Serine and        Arginine that are compatible with the structure of antibody        fragment;    -   (5) The two XbaI sites are the only two XbaI sites present in        Fad22;    -   (6) The restriction enzyme XbaI can be inactivated at 65° C.;    -   (7) The restriction enzyme XbaI is inexpensive yet robust, and        lacks star activity.

Still referring to FIG. 1, the Fad22 phagemid contains two origins, oneor phage replication (f1 ori) and the other for plasmid replication(pMB1 ori). Fad 22 also contains a Bla marker gene which encodesb-lactamasc and confers resistance against b-lactam antibiotics(penicillin, ampicillin, etc.). A lac promoter (Plac) is placed upstreamof the fusion gene (of an antibody fragment encoding gene and the GP3CTgene). The antibody fragment (e.g., Fab) encoding gene can comprisevarious domains of interest, such as Vκ1, CL, VH3 and CH1 domains asshown in FIG. 1. These domains can be arranged in any suitable order forexpression of the antibody fragment. Each domain has one or moremutations introduced therein, producing a library of mutants. Tworibosome binding sites (RBS1 and RBS2) and two secretion signalsequences (SS1 and SS2) are also included in Fad22, upstream of Vκ1-CLand VH3-CH1, respectively, to facilitate translation and secretionthereof.

In one example, after 20-fold overdigestion of Fad22 with XbaI, morethan 95% of the DNA fragments can be ligated and recut. Additionalenzymes that can be used include NcoI, SalI and XhoI.

It was demonstrated that with XbaI digestion, followed by self-ligationwith T4 DNA ligase, almost 100% conversion of display vector intoexpression vector can be achieved when the starting phagemidconcentration is equal to or lower than 2 ng/μl (FIG. 2). However, whenthe starting phagemid concentration is at or higher than 10 ng/μl, theconversion rate is lower than 50% (FIG. 2). This result may be explainedby the fact that higher concentration favors intermolecular ligation,instead of intramolecular self-ligation.

Conversion efficiency is further improved by using a combination of tworestriction enzymes such as XbaI and XmnI. The XmnI (GAATAATTTC) site ispresent in the gene encoding GP3 C-terminal fragment, and digestion byXmnI restriction enzyme yields a blunt end that is more difficult toligate than sticky ends (as produced by XbaI). Experiments demonstratedthat with the combination of XbaI and XmnI, followed by self-ligationwith T4 DNA ligase, almost 100% conversion of display vector intoexpression vector were achieved even when the starting phagemidconcentration is about 10 ng/μl (FIG. 3), which is significantly higherthan the results at 10 ng/μl when only XbaI is used (FIG. 2). Therefore,the combination of XbaI and XmnI dramatically extends the range ofstarting phagemid concentration, and improves the robustness of thiscleavage-religation approach, especially in high-throughput settingswhere hundreds of, or even thousands of phagemid samples are processedin parallel, and where the starting phagemid concentrations routinelyvary by several fold.

EQUIVALENTS

The present invention provides among other things novel methods andvectors for high-throughput screening and selection. While specificembodiments of the subject invention have been discussed, the abovespecification is illustrative and not restrictive. Many variations ofthe invention will become apparent to those skilled in the art uponreview of this specification. The full scope of the invention should bedetermined by reference to the claims, along with their full scope ofequivalents, and the specification, along with such variations.

INCORPORATION BY REFERENCE

All publications, patents and sequence database entries mentioned hereinare hereby incorporated by reference in their entirety as if eachindividual publication or patent was specifically and individuallyindicated to be incorporated by reference.

1. A recombinant polynucleotide comprising from 5′ to 3′: a firstnucleic acid sequence encoding an amino acid sequence to be displayed ona surface; a first restriction site selected from the group consistingof XbaI, NcoI, SalI and XhoI sites; a second nucleic acid sequenceencoding a surface peptide capable of being displayed on said surface;and a second restriction site selected from the group consisting ofXbaI, NcoI, SalI and XhoI sites; wherein the first nucleic acid sequenceis engineered in-frame with the second nucleic acid sequence; whereinthe first and second restriction site, when cleaved by correspondingrestriction endonuclease thereto, produce compatible sticky ends.
 2. Therecombinant polynucleotide of claim 1, wherein the first nucleic acidsequence encodes an antibody fragment.
 3. The recombinant polynucleotideof claim 1, wherein the first and second restriction site each encodeamino acids that do not interfere with binding affinity of the aminoacid sequence.
 4. The recombinant polynucleotide of claim 1, wherein thefirst and second restriction site each encode amino acids that do notinterfere with display of the surface peptide.
 5. The recombinantpolynucleotide of claim 1, wherein the first restriction site is XbaIsite.
 6. The recombinant polynucleotide of claim 1, wherein the secondrestriction site is XbaI site.
 7. The recombinant polynucleotide ofclaim 1, wherein the second nucleic acid sequence encodes a phage coatprotein, a yeast outer wall protein, a bacterial outer membrane protein,a cell surface tether domain, or an adapter, or a truncation orderivative thereof.
 8. The recombinant polynucleotide of claim 1,wherein the second nucleic acid sequence is gene III of filamentousphage M13, or a truncation or derivative thereof.
 9. The recombinantpolynucleotide of claim 1, wherein the second nucleic acid sequenceencodes an adapter capable of binding to a binding partner, wherein saidbinding partner is expressed as a fusion and directly displayed on thesurface.
 10. The recombinant polynucleotide of claim 1, wherein thesurface peptide is for phage display, yeast display, bacterial displayor mammalian display, or shuttling display therebetween.
 11. Therecombinant polynucleotide of claim 1, wherein when expressed, the aminoacid sequence and the surface peptide are displayed as a fusion proteinon the surface.
 12. A display vector for high-throughput conversion intoexpression vector, comprising: the recombinant polynucleotide of claim1; and a fusion tag sequence 5′ to the first restriction site or 3′ tothe second restriction site.
 13. The display vector of claim 12, whereinwhen the fusion tag sequence is 3′ to the second restriction site, thefusion tag sequence is engineered such that upon (a) removal of thesecond nucleic acid sequence by cleaving the first and secondrestriction site, and (b) religation of the compatible sticky endsproduced therefrom, the first nucleic acid sequence is in-frame with thefusion tag sequence.
 14. The display vector of claim 12 wherein thefusion tag sequence is selected from one or more of: an alkalinephosphatase tag, an AviTag, a cutinase tag, a halotag, a flag tag, ac-myc tag, a histidine tag, a GST tag, a green fluorescent protein tag,an HA tag, an E-tag, a Strep tag, a Strep tag II and a YoI 1/34 tag. 15.The display vector of claim 12, provided in a library of display vectorsin which each display vector has a unique first nucleic acid sequence.16. A method of converting a display vector to an expression vector,comprising: providing the display vector of claim 12; cleaving the firstand second restriction site with corresponding restriction endonucleasethereto, thereby producing said compatible sticky ends; and religatingthe compatible sticky ends to produce an expression vector in which thefirst nucleic acid sequence is in-frame with the fusion tag sequence.17. The method of claim 16, further comprising cleaving within thesecond nucleic acid sequence to increase religation efficiency in thereligating step.
 18. The method of claim 16, further comprising, beforethe religating step, diluting a product from the cleaving step.
 19. Themethod of claim 16, further comprising, after the religating step,introducing said expression vector into a host, to further characterizethe first nucleic acid sequence.
 20. The method of claim 19, whereinsaid further characterization includes sequencing the first nucleic acidsequence and/or expressing the first nucleic acid sequence.
 21. Themethod of claim 19, further comprising converting a plurality of displayvectors to a plurality of expression vectors in parallel, andintroducing said plurality of expression vectors into a population ofhosts, thereby performing said method in a high-throughput fashion. 22.The method of claim 21, wherein a conversion rate from the plurality ofdisplay vectors to the plurality of expression vectors is higher than90% or 95%.
 23. A method of identifying an affinity binder to a target,comprising: screening a population of first hosts each containing thedisplay vector of claim 12, to obtain a subpopulation of first hostshaving binding affinity to a target, wherein each first host displays,on a surface of said first host, the amino acid sequence encoded by aunique first nucleic acid sequence in the display vector, and whereinsaid subpopulation of first hosts each display an affinity binder tosaid target; converting display vectors isolated from said subpopulationof first hosts to expression vectors by cleaving the first and secondrestriction site to remove the second nucleic acid sequence andreligating the compatible sticky ends produced therefrom; andintroducing said expression vectors into a population of second hosts,to further characterize the affinity binder.
 24. The method of claim 23,performed in a high-throughput fashion.
 25. The method of claim 23,wherein a conversion rate of the converting step is higher than 90% or95%.